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ABSTRACT 


This report documents research activities which were 
conducted from 1 January 1983 to 30 September 1983 under the 
auspices of National Aeronautics and Space Administration 
Contract NCC 9-9. During this contract period the primary 
focus of research was on alternatives to sampling-theory 
stratified and regression estimators of crop production and 
timber biomass. An alternative estimator which is viewed 
as especially promising is the errors-in-variable regression 
estimator. Investigations conducted during the course of this 
contract period established the need for caution with this 
estimator when the ratio of two error variances is not precisely 
known. One technical report on these investigations has been 
completed, a shorter version of which is being prepared for 
submission to a professional journal. In addition, further 
research topics on errors-in-variables estimation have been 
identified . 


Richard F. Gunst 
Principal Investigator 
NASA Contract No. NCC 9-9 
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I. RESEARCH ACTIVITIES 

Research supported by this contract is directed toward the 
study of estimators of crop production and timber biomass, two of 
potentially many applications. The specific focus of these inves- 
tigations is on the development of methodologies which will enable 
satellite remote-sensing information to be combined with more 
accurate but more costly ground observation. Two of the. classical 
estimation techniques for combining satellite data with ground 
truth are sampling-theory stratified estimation and regression 
estimation. 

The assumptions which underly the use of the two sampling- 
theory estimators require that one of the sets of observations 
(satellite or ground-truth mear rements) be known exactly, i.e., 
without measurement error. In some applications it is unreasonable 
to expect that either satellite or ground truth observations will 
be free from error. When t.iis occurs a more appealing estimation 
methodology assumes an ''errors-in-variables" regression model 
underlines the relationship between satellite and ground-truth 
measurements . 

Section A below outlines the investigations which were 
conducted to assess the suitability of errors-in-variables models 
for application to the problems mentioned above. Section B 
describes related studies which were conducted on classical 
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regression estimators. It is anticipated that these related 
studies will be connected with errors-in-variables estimators 
in future investigations. 

A. Errors-in-Variables Models 

Let y denote a measurement (e.g., timber biomass) taken by 
ground observation and x a corresponding measurement obtained by 
satellite remote sensing. It is desirable to establish an empirical 
relationship between y and x so that the more readily obtained and 
less costly x measurements can be used to accurately estimate the 
more costly y values. Assuming that both y and x contain measure- 
ment error, a linear "errors-in-variables" regression model can be 
formulated as follows. 

Denote the true (i.e., error free) ground-truth measurement bv 
Y and the corresponding true satellite measurement by X. Assume 
that an adequate approximation to the relationship between Y and 
X is given by the linear model 
Y = a + 6X. 

In this setting Y and X cannot be observed because of measurement 
error; rather, one observes 

y ■ Y + v and x = X + u, 

where v and u are the measurement errors. In this framework the 
usual least squares estimators of a and 6 are biased since an 
underlying assumption for least squares estimation is that the 
predictor variable X is measured without error. 
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Most of the literature on errors-in-variables estimation is 
concerned with establishing conditions under which consistent esti- 
mators of a and 8 exist. If u, v, and X are assumed normally distri- 
buted and all model parameters are unknown, consistent estimators do 
not exist. If one or more of the model parameters are known (e.g., 
variances of the error measurements) , consistent estimators of a 

and 8 are ordinarily available. In particular, if the ratio of the 

2 2 

error variances, A = ° v /° u » Is known then consistent Coi.imators of 
a and 8 exist. 

While the theoretical existence of consistent estimators of a 
and 8 has been an important topic of study, very little research has 
been conducted on (i) the effects of assuming an incorrect value for 
a model parameter and (ii) the construction of consistent estimators 
when X or u and v are assumed to be nonnorraally distributed. A 
major achievement of the research conducted under this contract" is 
an extensive investigation of the effects of assuming an incorrect 
value for the error variance ratio A. The results of this investi- 
gation are reported in a manuscript entitled "Sensitivity of Errors- 
in-Variables Estimators to the Specification of the Ratio of Error 
Variances" which is appended to this report. In the near future 
this manuscript will be submitted for publication to a scientific 
journal . 

The second research topic is currently being explored. The 
literature on errors-in-variables estimators established the 
existence of consistent estimators of a and 8 when X or (u,v) is 
nonnormal but no guidance is provided on how to construct consistent 



estimators. Maximum likelihood estimation is generally intractable. 
Moment estimators exist and are consistent but moment estimation is 
known to be inefficient for finite sample sizes. An alternative 
approach which appears promising is outlined in Section II of 
this report. 

B. Related Research 

Least squares estimators are known to be seriously affected by 
the presence of outliers and ccllinearities, even if the requisite 
model assumptions are valid. The principal investigator has been 
actively investigating topics of importance to an understanding of 
outliers and collinearities over the past several years and is 
continuing to do so under this contract. It is anticipated that 
the results of these investigations will have an important impact 
on the application of errors-in-variables estimation, especially 
since the model framework admits the possible presence of outliers 
through the error terms. 

Two manuscripts were completed on these topics during the 
current contract period. One manuscript is an invited critique 
of a manuscript on collinearity measures which will be published 
in the May 1984 issue of The American Statistician. The second 
manuscript presents new results on outlier diagnostics for ridge 
regression and smoothing spline estimators. This manuscript has 
been submitted for publication. 
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II. PROSPECTIVE FUTURE RESEARCH 

As mentioned in the previous section, an important topic 
of research on errors-in-variables models is the construction 
of estimators when the true (unobservable) predictor variable X 
or the error terms u, v are not normally distributed. For example, 
in estimating crop proportions both y and x are bounded by the 
interval [0,1]. The study of theoretical properties of errors-in- 
variables estimators under the assumption that X follows a prob- 
ability distribution over the unit interval (e.g., uniform or beta) 
would appear to be more reasonable than assuming an (unbounded) 
normal distribution. 

Likelihood functions for (y,x) when X is nonnormal are generally 
theoretically intractable and fraught with computational difficulties. 
Iloraent estimators are easy to deal with but ordinarily inefficient 
for finite sample sizes. An alternative to maximum likelihood or 
moment estimation which is potentially fruitful for productive research 
and application is "pseudo maximum likelihood" estimation (e.g., Gong 
and Samaniego , Annals of Statistics, 1981). This theory allows all 
nuisance parameters to be replaced in the likelihood function by 
consistent estimators of the corresponding parameters and then the 
likelihood function is maximized with respect to the parameter(s) 


of interest. Future research will investigate asymptotic properties 
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of pseudo maximum likelihood estimators and compare their properties 
with moment and least squares estimators. 

Once viable estimation methodologies are available under 
feasible model assumptions, errors-in-variables estimators will 
be compared with their samp ling- theory counterparts. It is 
intended that both theoretical and empirical (using actual satellite 
and ground-truth measurements) comparisons will be conducted. 
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III. PRESENTATIONS AND OTHER ACTIVITIES 


An oral presentation o£ preliminary results achieved under 
the support of this contract was made at a conference held at 
the Johnson Space Center in September, 1983. Further presen- 
tations are planned for future conference presentations at the 
Johnson Space Center. In addition, oral presentations are planned 
for national and regional meetings of the American Statistical 
Association, including the 1984 Annual Meetings next August in 
Philadelphia, PA. 

During this contract period one advanced statistics graduate 
student in the Department of Statistics, Southern Methodist 
University, was supported by contract funds. Mani Y. Lakshminarayanan 
is currently conducting dissertation research on errors-in-variables 
models under the direction of the Principal Investigator. The 
investigations discussed in this report are a result of the colla- 
boration between Mr. Lakshminarayanan and the Principal Investigator. 


IV. TITLES OF COMPLETED RESEARCH 


"Sensitivity of Errors-in-Variables Estimators to the 
Specification of the Ratio of Error Variances," revised 
manuscript under preparation for submission to a professional 
journal, (with M. Y. Lakshminarayanan) . 

"Toward a Balanced Assessment of Collinearity Diagnostics," 
The American Statistician , 38 (to appear, May 1984). 

"Regression Diagnostics and Approximate Inference Procedures 
for Penalized Least Squares Estimators," submitted for 
publication, (with R. L. Eubank) 
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SENSITIVITY OF ERRORS- IN-VARIABLES ESTIMATORS TO THE 
SPECIFICATION OF THE RATIO OF ERROR VARIANCES 


Richard F. Gunst and Manl Y. Lakshmlnarayanan 
Department of Statistics 
Southern Methodist University 
Dallas, Texas 75275 

1. INTRODUCTION 

Crop area estimation and the estimation of timber biomass are 
two applications of satellite remote sensing. Estimates obtainable 
with current technology often are not sufficiently precise for geo- 
graphical regions which are as small as Crop Reporting Districts or 
counties. In order to improve the precision of these estimates based 
soley on remote-sensing information, field measurements are taken on 
relatively small portions of the geographical areas of interest. 
Stratified (sampling-theory) and regression estimation (e.g., Cochran 
1963) are two statistical methodologies which can be used to combine 
the satellite information with that collected on the ground. In 
particular, regression estimation based on "errors-in-varisbles" 

(EV) models is viewed as an especially promising alternative for 
increasing the precision of satellite remote-sensing estimates of 
crop and biomass area. Most applications of EV estimation, however, 
require knowledge of a ratio of the variances of the measurement 
errors in order to obtain consistent estimates of the unknown re- 
gression coefficients. In this paper the sensitivity of EV estima- 
tors to the selection of the ratio of error variances is investigated. 
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Errors-in-variables models are appropriate when both variables 
in a regression model are subject to measurement error. Thus, a 
theoretical regression model specifying a relationship between a 
response variable (e.g., "g-ound-t-uth" crop or biomass area measure- 
ments) and a predictor variable (e.g., satellite area measurements) 
might be defined as follows: 

Y = a + BX + e , (1.1) 

where Y and X denote the true response- and predictor-variable values, 
respectively, and e is an unknown model specification error. In 
practice, Y and X cannot be measured exactly; rather, one observes 
x = X + u and y * Y + v , (1.2) 

where u and v are measurement errors which may be correlated. Model 
(1.1) can now be expressed in terms of the observable quantities y 
and x as: 

y *a+ Bx + ([v + e]-3u). (1.3) 

Note that in model (1.3) the specification and measurement errors of 
the response variable (i.e., e and v) are additive and are not sep- 
arately estimable. Consequently, in this investigation the speci- 
fication error is assumed zero or negligible relative to the mea- 
surement errors and the following reduced model is considered: 

y = + Bx + (v-Bu) ; (l.A) 

equivalently, the EV model specification incorporates equations 
(1.1) and (1.2) with e = 0. 

Certain types of replication allow estimation of all EV 
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model parameters (see Kendall and Stuart 1977, Chapter 29). Like- 
wise, measurement of additional variables which are correlated with 
the predictor variable X but not with the measurement errors can 
also allow consistent estimation of all model parameters (e.g., 
Durbin 1954; Feldstein 1974; Sargan 1958). Without either repli- 
cation or the measuring of additional variates, it is possible to 
consistently estimate all EV model parameters only when one or more 
(functions of) the unknown parameters is known. 

In Section 2 of this paper EV estimators are derived under 
normality assumptions and their lack of consistency is examined. 

The sensitivity of EV estimators when the ratio of the error 
variances is assumed known, the most prevalent side condition which 
is imposed to assure consistency, is examined in Section 3 by eval- 
uating the derivative of the EV slope estimator under a variety of 
probabilistic assumptions on the unknown variance ratio. Section 4 
presents a simulation study investigating the mean squared error 
properties of the EV slope estimator for a grid of assumed and true 
values of the unknown ratio of error variances. Concluding remarks 
are made in Section 5. 

2. MAXIMUM LIKELIHOOD ESTIMATION 

A distinction must be made between two assumptions about the 
true (unknown) response and predictor variables in model (1.1) be- 
fore maximum likelihood estimators can be derived and their appro- 
priateness evaluated. "Functional" EV models stipulate that the 
true underlying variates are constants whereas "structural" EV 
models assume that (Y,X) are realizations of some joint probability 
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distribution (e.g., Kendall and Stuart 1977, Chapter 29; Moran 
1971). In this paper only the latter specification is studied; 
in particular, assume that 

X % N(y,oJ) , (2.1) 

which, from (1.1) (with e - 0) , necessarily implies normality for 
Y. In addition, it is co-.-.on to assume that the measurement errors 
are jointly normally distributed, independently of (Y,X). Although 
correlation between the two measurement errors does not add substan- 
tive complexity, for simplicity and ease of presentation it is 
assumed that u and v are uncorrelated with 


2 2 . 

u ^ N(0,o -i ) and v ^ N(0,O v ) . 


( 2 . 2 ) 


Under these assumptions, differentiation of the likelihood 
function results in the following system of maximum likelihood 
estimating equations: 


x - U 


2 ~2 
s = c + a 
x X u 


y - a + Su 


2 "2 2 2 
s » 6 a v + a 
y X v 


(2.3) 


s = Bo; 
xy X 

2 2 

In equations (2.3) s and s are the sample variances of x and y, 

x y 

respectively, and s is their sample covariance. There are six 
EV model parameters which must be obtained from these five estimating 
equations; equivalently, there are five sufficient statistics from 
which to estima-.a all six model parameters. Much theoretical work 
has been conducted to determine whether the six model parameters 
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are estimable under the normality assumptions (2.1) and (2.2). 

As will be detailed in Section 2.2, without additional knowledge 
about one or more of the model parameters it is impossible to 
consistently estimate all six parameters under the above nor- 
mality assumptions. Before discussing the reasons for this lack 
of estimability, consider the solutions to equations (2.3) when 
one or both of the measurement error variances is known. 

2.1 Estimation with Known Measurement Error Variances 

Maximum likelihood estimates are the solutions to equations 
(2.3), provided that the solutions fall within the parameter space 
of the joint distribution of (X,u,»). Estimation of U, a, and B 

pose no parameter space difficulties since the parameter space for 

2 2 

each is the entire real line. Estimation of the variances a v , a , 

X u 

2 

and O y requires that the solutions be nonnegative, leading to the 
following set of inequalities for the individual estimates: 


~2 

For 0 > 0 

u — 

~2 

For a > 0 
v — 

~2 

For o x _> 0 


(i) 6s 2 > s 

x — xy 

(ii) s 2 > 6s 

y - xy 

(iii) s 2 -a 2 >0 (2.5) 

2 ~2 

(iv) s - c >0 
y v — 

(v) if 0^ > 0, sign (B) - sign (s xy ) 

2 

if 0^ * 0, B is indeterminate. 


In the remainder of this paper it is assumed that solutions to 
equations (2.3) satisfy these inequalities; refer to Kendall and 
Stuart (1977) for alternatives when inequalities (2.5) are not 


satisfied. 
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Maximum likelihood solutions to equations. (2.3) impose two 
implicit restrictions on the estimator of B: 

~ 2 2 2 2 * 

B(s - o ) - s and s - o - Bs , (2.6) 

xu xy y v xy 

with estimates replacing parameter values in (2.6) depending on 

which parameters are assumed known. Since both variances are 

nonnegative, equations (2.6) lead to the following inequality 

on the EV slope estimator: 



provided that 8, s ^ 0 (which occurs with probability one). The 
lower limit in inequality (2.7) is the least squares slope estimate 
from the regression of y on x and is attained when it is known that 
no measurement error occurs with the predictor variable. The upper 
limit is the inverse of the least squares slope estimate from the 
regression of x on y. The upper limit is attained when it is known 
that no measurement error occurs with the response variable. 

Several authors have attempted to circumvent the lack of 
°stimability of B by defining estimators which are functions of 
tht limits in inequality (2.' T J. Gini (1921) proposed the arithmetic 
mean of the limits. Teisser (1948) and Kaila (1980) suggested using 
the geometric mean of the limits. Pal (1980) showed that neither 
of these proposed estimators is consistent; moreover, he argued 
that any EV slope estimator which is intermediate to the two limits 
is optimal for some value of the ratio of the error variances but 
not optimal for others. 


> 
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Since the main interest in EV model estimation centers on the 

relationship between the true response and predictor variables, 

2 

estimation of a, 6, U, and o x is of paramount importance. Thus some 
knowledge of the measurement error variances is required to solve 
equations (2.3). There are four special cases which can arise. 

Case 1: o 2 Known 
u 

In this case, 

6 - s /(s 2 - a 2 ) . (2.8) 

xy x u 

2 

Observe that if 0^*0, this EV slope estimator is the usual least 
squares estimator and is equal (in magnitude) to the lower bound 
in (2.7). 


2 

Case 2: a Known 
v 

In this case, 

B - (*y * %)/\ y • (2-9) 

2 

If * 0, this EV slope estimator is the reciprocal of the least 
squares estimator from the regression of x on y and is equal 
(in magnitude) to the upper bound in (2.7). 


2 2 

Case 3: X « a to Known 
v ;i 

This assumption is the most frequently cited means of resolving 

the lack of a unique solution to equations (2.3). When this 

assumption is made, al 1 the restrictions in (2.5) are satisfied 

unless s »0, which occurs with prcbaDil'ty zero. In addition, 
xy 

this assumption does not require explicit knowledge of the exact 
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value of either of the measurement error variances, only the rela- 
tive magnitude of the variances. Often it is reasonable to assume 
the measurement errors are of the same magnitude so that X ■ 1. The 
resulting EV slope estimator is 

8 - l(s 2 - Xs 2 ) + {(8j-Xs^)+4X S ^} 1/2 ]/2s xy . (2.10) 

2 2 

Case 4: Both a and o Known 
u v 

In this case, equations (2.3) result in two estimators of 8; viz., 

estimators (2.8) and (2.9). Depending on whether equations (2.5) 

22 *2 *2 

(iii) or (iv) are satisfied (with O u and o v replacing o u and o^) , 

A 

8 is either the solution to (2.10) or indeterminate (see Birch 1964) . 


2.2 Identifiability Under Normal Assumptions 

The maxinum likelihood estimating equations (2.3) are derived 
under the assumption that the predictor variable X and the measure- 
ment errors u and v are normally distributed, assumptions (2.1) and 
(2.2). Not only does this result in estimating equations which 
produce nonunique solutions, but also the parameter 8 is "noniden- 
tifiable" in the joint distribution of (y,x). Identifiability is 
a distributional property which requires that only one set of para- 
meters can give rise to any specific distribution of the observed 
random variables. Under assumptions (2.1) and (2.2), the joint 
distribution of (y,x) is bivariate normal with 


Uy - a + 8u 

2 .2 2 2 

a ■ 6 o v + a 

v X v 


2 2 . 2 

a - a v + a 
x X u 


a * 8a v 
xy X 


( 2 . 11 ) 
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That this joint distribution is nonidentif iable can be demonstrated 

by the following sets of parameters from the distributions of (X,u,v), 

each of which produces a bivariate normal distribution for (y,x) with 
2 2 

U ■ v, y -y, a -o -1, and p *.5 (Madansky 1959): 
y X y x xy 

(a) o£ - 1/2, o* - 1/2, a* - 1/2, B ■ 1 , a - v-v 

(b) a* - 1/3, cj - 2/3, aj » 1/4, 8 - 3/2, a«v-3u/2. 

Geary (1942) showed that when (u,v) are jointly normally 
distributed, if X possesses a finite nonzero cumulant of order 
greater than two then B is identifiable in the Joint distribution of 
(y,x); thus, nonnormal distributions for X generally allow maximum 
likelihood estimation of B. Reiersol (1950) strengthened this result 
by proving that when the distribution of (u,v) is bivariate normal, 
nonnormality of X is a necessary and sufficient condition for iden- 
tif lability of B. He also showed that if the distribution of X is 
normal, a necessary and sufficient condition for idantif lability 
of B is that neither the distribution of u nor tnat of v is divi- 
sible by a normal distribution. Further, Reiersol established that 
once B is identifiable, so is a. If S is identifiable, he proved 
that a necessary and sufficient condition for identif lability 
of the other model parameters is that (i) the distribution of X (Y) 
is not divisible by a normal distribution and (ii) either u or v 
is identically zero. These important results on identif lability 


are summarized in Table 2.1 


10 


( 

3. INFLUENCE OF A ON THE EV SLOPE ESTIMATOR 

The results of the previous section demonstrate that auxiliary 
knowledge must be available in order to estimate all six model para- 
meters when normality of (X,u,v) is assumed. The most common assump- 
tion which is made is that the ratio of measurement errors X is known. 
This allows estimation of 8 with assurance that the requisite restric- 
tions (2.5) on the parameter estimates will hold. Likewise, this as- 
sumption does not require explicit knowledge of either of the measure- 
ment error variances. 

Published research on EV model estimation has concentrated more 
on the existence of consistent estimators of 8 under various alterna- 
tive assumpt/ons than on the sensitivity of the resulting estimators 
to the assumptions. The dearth of sensitivity studies is surprising 
in light of the known lack of identif iability of 8 under the norma- 
lity assumptions. The need for an evaluation of the sensitivity 
of equation (2.10) to the value of A derives not only from the 
uncertainty of the robustness of the estimator to the choice of X 
but also from parallel studies of other estimators which are simi- 
larly dependent on an unknown ratio of variances such as MINQUE 
variance component estimation. These latter studies (e.g., Hess 
1979) have demonstrated that estimators which depend on selection 
of variance ratios can be affected by the choice of the ratio. 

No such guidance has been reported for EV parameter estimation. 

A 

Consider now the derivative of 8 with respect to X. Asymp- 
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( 

% 



totically (i.e., replacing the sample moments by their corresponding 
parameter values) , 

A 

-|f-- -8t/(6 2 +X), (3.1) 

2 2 

where t ■ o /o Y is the "noise-to-signal ratio" for the observable 

il A 

predictor variable x. From equation (3.1) one can readily see that 

A 

the rate of change of 6 with respect to X is not only a function of 
the value of X but also of the true parameter value 8 and the noise- 
to-signal ratio. If the noise-to signal ratio is sufficiently small, 

A 

equation (3.1) reveals that 6 will be relatively insensitive to the 
value of the true variance ratio X. In addition, if for fixed t 

A 

the true variance ratio X is sufficiently large, | ( 30/ 3X) | / 6 will be 
relatively insensitive to the specific value of X, especially if 6 
is large. 

That the estimator (2.10) can be extremely sensitive to the 
value of X is illustrated in Figure 3.1. This figure graphs the 

A 

(absolute) proportional rate of change of 8, t/(8+X), as a function 
of X for two choices of the noise-to-signal ratio t and two choices 

A 

of the true parameter 6. The figure confirms that 8 is most sensi- 
tive to the choice of X when t is large. For fixed t, the estimator 
is less sensitive to the choice of X when X is large, especially if 
coupled with a large 8. In other words, under the precise condi- 
tions for which EV estimation is most often proposed (i.e., t 

moderate to large and X small to moderate-each condition implying 
2 

a u is nonnegligible) the EV slope estimator is extremely sensitive 
to the true value of X. 
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A somewhat different perspective on the sensitivity of 6 to 

the value of A Is obtained by assuming that A Is stochastic rather 

than deterministic. Lindley and El-Sayyad (1968) suggest using a 

Uniform (k -1 ,k) prior distribution for the error variance ratio if 

one believes that the two measurement errors are of the same magni- 

2 

tude. In addition, one might propose N(k,o^) or Chi-square (k) 
priors as reasonable alternatives in order to study the sensitivity 

A 

of 6 to a variety of suspected prior distributions. 

Given any of the above prior distributions for A, one would 

A 

like to evaluate the expected rate of change of 0 with respect to 

that prior; i.e., the expectation of (3.1) with respect to the 

prior on A. Closed-form expectations do not ordinarily exist; 

however, the following theorem (e.g., Bishop, Feinberg, and Holland 

1975, p. 493) allows approximate expectations to be determined. 

Theorem 3.1 (Method of Statistical Differentials) 

Let g(x^,X 2 » . . . ,Xp) be a real-valued continuous function with 

continuous first and second derivatives at the point • • • »P p ) . 

Let x * {x, ,x_ ,...,x } be a sequence of sample means of the 

n In’ 2n’ pn 

vector random variable x - [x^,X 2 » . • . ,x p ] . Finally, let E[x] ■ v_ 
and let the distribution of x have finite third moments. Then 
n 1/2 [g(x n ) - g(w) ] - N(0,A) , 


where 


4 - [g (1) (±0 S (P> (U.) ]^[g (1) (M.) , .,g <P) (^) ] * 


g^(jj) is the partial derivative of g(x) with respect to x^ 
evaluated at x ■ _u, and $ is the variance-covariance matrix of x. 
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ORIGINAL V: xr V ' 

OF POOH QUALhY 

Applying this theorem to the expectation of (3.1) under the 
three priors listed above for A, the following approximations are 

a 

obtained from a three-term Taylor expansion of 36/3 A: 

A ^ Uniform (k \k) 

- 2Bt{(28 2 +k+k" 1 )" 1 + (k-k -1 )” 2 /3(2B 2 +k+k -1 ) 3 } (3.2) 

A * N(k,o 2 ) 

nsr 

A ^ Chi-Square (k) 

E |j|j I - Bt{(6 2 +k)“ 1 + 2k(B 2 +k) -3 } . (3.4) 

Figures 3. 2-3.4 depict the absolute proportional rate of change of 

A A 

6, |3B/3A|/B, under the Uniform, Normal and Chisquare priors using 
equations (3.2)— (3.4) . The Uniform prior displays the least sensi- 
tivity to the value of the variance ratio while both the Normal and 

A 

the Chisquare priors produce large changes in B, especially for small 
values of k. In each case the sensitivity is least when k is large 
and the noise-to-signal ratio t is small, as with the curves in 
Figure 3.1. 

In each of Figures 3.2 to 3.4 the proportional rate of change 

A 

of 6 is greatest when the parameter k of the prior distribution for 
A is small. Thus if the measurement error in x tends to be much 
smaller than that of y and all other model parameters are fixed, 
the EV maximum likelihood estimator is relatively insensitive to 


1 Bt{ (B 2 +k ) _1 + 


o 2 (e 2 +k)" 3 } 


(3.3) 
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the exact value of X; i.e., values of X over a fairly wide range 
will result in similar, relatively small, estimator changes 
whereas small values of X result in consequential estimator 
changes for the same values of other model parameters. 

The results of this section can be summarized concisely as 
follows. First, if the measurement error in x is small relative 
to the measurement error in y and small relative to the variability 
in X (i.e., X large, t small), then the EV maximum likelihood esti- 
mator of 8, equation (2.10), will be relatively insensitive to 

A 

the exact value of X; therefore, one would expect that 6 would be 
relatively insensitive to erroneous selection of X in a neighbor- 
hood of the true value. When the measurement error in x is large 
relative either to the error in y or to the variability in X, then 
the EV estimator of 8 is highly influenced by the true value of X 
and, one would expect, to erroneous choices for X in equation (2.10). 

These results indicate that the selection of X can be critical 
for accurate estimation of the slope parameter and one cannot merely 
assume that any "close" guess for the variance ratio will provide 
a suitable estimate. The simulation study reported in the n’.xt 
section documents more explicitly the dependence of the estimator 
on the correct choice of the measurement error variance ratio. 

4. SlbfULATION STUDY 

Asymptotic properties of EV model estimators are often cited 
with little regard to whe-her they are valid for finite sample 
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sizes. In particular, asymptotic variance formulae are used to 
compare alternative estimators and to draw Inferences on model 
parameters. In this section results of a simulation study are 
examined in order to (i) determine whether asymptotic variance 
formulae are adequate approximations to the true variances for 
finite samples, (il) gauge the magnitude of the effects of 
assuming an incorrect value for the measurement error variance 
ratio X, and (iii) assess the relative merits of least squares 
and EV estimators. 

2 

The following simulation results fix the values of 6, o^, 

2 2 

and o at 3.0, 5.0, and 5.0, respectively, so that by varying o 

A ” 

the results are only a function ot X. Under the assumption of 
a known (correct) variance ratio, the EV estimator of 8, equation 
(2.10) is asymptotically unbiased; l.e., 

plim(8) ■ 6 . (4.1) 

A 

The asymptotic variance of 8 is well known (e.g., Robertson 1974; 
Gleser 1981): 

asvar(8) ■ n ^{(B^+X)t + X t ^ } . (4.2) 

A 

In Tables 4.1 and 4.2, the mean and mean squared error of 8 are 
compared to the asymptotic values (4.1) and (4.2) for N«1,000 
samples of size 20, 50 and 100. The ratios tabulated in Table 4.1 

A 

are the sample means of the 1,000 6 values divided by 3. In 

- 2 

Table 4.2, the sample mean s> uared error, 1(6^8) /N, of 6 is 
compared to the corresponding theoretical values calculated from 
(4.2). In all cases, the sample means and mean squared errors 
in Tables 4.1 and 4.2 are evaluated using the assumed values of 
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X, and compared to equations (4.1) and (4.2) using the true value 
of X. In this way both the effect of scmple size and the effect 
of an incorrect choice of X can be assessed. 

The entries in Table 4.1 corresponding to the correct assumed 
values for X indicate close agreement between the average estimated 
EV slope values and the true parameter values, especially when the 
sample size is at least 50. In Table 4.2 agreement is not as good 
for small sample sizes but appears to be adequate for samples of 
size 100 if X not too large. It would appear that a sample of 
size 200 would be adequate for acceptable agreement between the 
sample variance and the asymptotic variance. Tables 4.3 and 4.4 
exhibit comparisons of sample mean squared errors to asymptotic 
variances for 1,000 samples of size 200 for three values of 6 and 
two choices of the noise-to-signal ratio t. In most of the model 
configurations the agreement is quite adequate when the correct 
value of the variance ratio is assumed, especially when 3 is large 
and t is small. 

The off-diagonal elements of Tables 4.1 to 4.4 reveal the 
effects of incorrectly guessing the variance ratio X. Incorrectly 

A 

assuming too large a value of X results in underestimation of 6 
while the reverse is true when X is assumed too small (Table 4.1). 

Any incorrect guess for X produces an over-estimate (except for a 
few ratios which are within sampling error) for the asymptotic 

A 

mean squared error of 3, but it is far more serious to guess too 
small a value for X than too large a one (Tables 4. 2-4. 4). 
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Small sample sizes yield very erratic results; when n»20, the 
small sample estimates are unreliable as measures of asymptotic mean 
squared error (recall that the EV estimator is asymptotically unbiased). 
Even with larger sample sizes the sample mean squared errors are only 
reliable estimates of the asymptotic variances when the assumed value 
of X is in a narrow interval around the true error variance ratio. 

For samples of size 200, the agreement between sample and asymptotic 
mean squared errors is adequate if the assumed X is between approxi- 
mately half and double the true ratio, especially — as the results of 
the previous section suggested — when the true value of X is large, 
the noise -to-signal ratio is small, and 6 is large. While not con- 
firmatory, this empirical finding about the close agreement between 
sample and asymptotic mean squared errors supports the use of Lindley 
and El-Sayyad's (1968) uniform prior in studies of the properties of 
EV slope estimators. 

Another comparison which is of importance is that of the mean 
squared error of the EV maximum likelihood estimator to that of the 

A A 

least squares estimator, mse(8) /mse(S L g) . Table 4.5 displays the 
ratios of the sample mean squared errors for the two estimators 
using the same model configurations as in Tables 4.1 and 4.2. It 

A 

is evident from this table that 8 offers substantial improvement 
over least squares unless the sample size is small or the assumed 
value of X is much less than the true value. As the sample size 
increases, only assumed values of X which are grossly smaller than 
the true ones will lead to a preference for least squares over EV 


18 


l 
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estimation. 

Lest these conclusions be affected by the inadequacy of 
empirical mean squared errors as estimates of asymptotic mean 
squared errors. Table 4.6 displays the ratios of the asymptotic 
mean squared errors of the EV slope estimators to the corre- 
sponding ones for least squares. The asymptotic mean squared 
errors for the EV estimators are based on assuming an incorrect 
value X* for X. The appropriate expressions for the two estimators 
are: 

asmse(S LS ) = g 2 t (t+n _1 ) (l+t)" 2 + n'^d+t" 1 )' 1 (4.3) 

and 


asmse(B) = [-(B 2 +X*)+(X-X*) t + { [B 2 -X*)+(X-X*)t] 2 +4X*g 2 } 1/2 ] 2 /46 2 
+ {3£ 2 t 2 (X-X*) 2 +(B 4 -4X*B 2 +X* 2 ) [3S 2 +(X+S 2 )t + Xt 2 ] 

- 33 2 (6 2 -X*) 2 + 6X*S 2 [3 2 +(X+£ 2 )t+Xt 2 ]}/n(B 2 +X*) 2 . (4.4) 


As is apparent from Table 4.6, the conclusions drawn from Table 4.5 
remain valid when the sample mean squared errors are replaced by 
asymptotic mean squared error. 

Finally, Figures 4.1 to 4.6 display asymptotic mean squared 

A 

error comparisons of B and using equations (4.3) and (4.2) or (4.4), 

as appropriate. The horizontal axes in Figures 4.1 and 4.2 are the 

2 2 2 

variance proportions ^/(cj^+o ) =»t/(l+t), a monotonic transformation 
of the noise-to-signal ratio. As indicated by Figures 4.1 and 4.2, 
for a fixed sample size and a fixed variance proportion the EV esti- 
mator (2.10) using the correct value of X is preferable to least 
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squares only for small values of X (recall, small X Implies the 
error in y is comparable or less than the error in x) . As the 
noise-to-signal ratio decreases, the range of values of the 

A 

variance ratio X for which 3 offers improvement over least squares 
increases. Increasing the sample size enlarges the (t,X)-region 
for which EV estimators are preferable to least squares. 

A A 

Figure 4.3 graphs the ratios of asmse(B) to asmse(B^g) using 

equations (4.2) and (4.3), respectively, for four sample sizes and 
2 2 

(B.J^cO = (3,5,10). Except for extremely small values of X, the 

ratio of asymptotic mean squared errors is less than unity, especially 

so for large sample sizes. These graphs confirm the conclusions 

drawn from Tables 4.5 and 4.6 for correct choices of X. 

Figures 4.4 to 4.6 graph the ratio of the asymptotic mean 

squared errors using equation (4.4) for EV estimation with an 

incorrect choice for X. Model parameters for these figures are 
2 

(3,o^) * (3,5) and X * 1, 6 and 10, respectively. Again the con- 
clusions drawn from Tables 4.5 and 4.6 are graphically confirmed 
from these figures: unless X is selected much smaller than its true 
value the EV estimator of 8 is preferable to least squares. 

5 . CONCLUDING REMARKS 

The results of Sections 3 and 4 establish the extreme sensi- 
tivity of the EV maximum likelihood estimator to the choice of the 
measurement error variance ratio. The sensitivity is model depen- 
dent and is greater when the true variance ratio X is small and 
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the noise-to-signal ratio t is large. Large sample sizes enable 
the EV estimator to offer improvement over least squares if the 
assumed measurement error variance ratio is not too much smaller 
than the true ratio; however, the reliable use of asymptotic 
formulae for estimator variances requires that the variance ratio 
be known within a narrow interval of the true value and that the 
sample size be at least 200. 

Throughout Sections 3 and 4 the EV slope estimator shows 

least sensitivity to the choice of the variance ratio A when the 

2 2 2 2 

noise-to-signal ratio t * o^/o^ sma H an< * * “ is ^ ar 8 e » 

especially for large values of 8. Together the conditions on A 

and t imply that there is relatively little error in the predictor 
2 

variable (i.e., - 0). Thus, the model configurations for which 

the EV slope estimator is relatively insensitive to the choice of 
A are those for which least squares is most appropriate. In other 
model configurations (i.e., when the error in x is not negligible) 
the EV slope estimator exhibits demonstrable sensitivity to the 
assumed value of the measurement error variance ratio. 

In spite of these limitations on the application of EV esti- 
mation, the simulation results and asymptotic mean squared error 
comparisons in Section 4 indicate clear preference for EV esti- 
mation over least squares. If sample sizes are at least 200, this 
general conclusion is violated only when the assumed value of A 
is much less than the true value. 

Little insight can be g? ined from this study relative to 
the performance of EV maximum likelihood estimators under non- 


i 


normal assumptions. While the parameters become identifiable 
under the conditions stated in Table 2.1, analytic derivations 
of estimators and asymptotic variances are intractable for most 
alternatives to the normality assumptions. This important area 
of research is currently under investigation. 
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Table 2.1 Identlf lability Conditions for EV Models* 


(a) Identlf lability of 8 

(i) If (u,v) is Normal, then X cannot be Normally Distributed 
(ii) If X is Normal, the distribution of neither u nor v can 
be divisible by a Normal Distribution 

(b) B is Identifiable 
(i) a is Identifiable 

(ii) All other Model Parameters are Identifiable iff 

(1) The distribution of X (Y) is not divisible 
by a Normal Distribution, and 

(2) Either u * 0 or v * 0 


*A11 model parameters unknown. 


Table 4 . 1 : Ratio of Simulated and Asymptotic Expectations of EV Slope Estimator 
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Table U .2: Ratio of Simulated and Asymptotic Mean Squared Errors of EV Slope Estimators 
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Table A .A: Ratio of Simulated and Asymptotic Mean Squared Errors of EV Slope Estimator 
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Fig 4.2: Asymptotic MSE comparisons between LS and EV 
estimators, Beta-squared-5.0 
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TOWARD A BALANCED ASSESSMENT OF COLLINEARITY DIAGNOSTICS 
Richard F. Gunst* 

Periodically it is wise to review the foundation upon which 
statistical methodology is based. With the availability of main- 
frame and micro computer technology there is too great a tendency 
to become more enamored with the sophistication with which sta- 
tistical analyses can be reported than with the theoretical under- 
pinnings of the results. Professor 3elsley's article contributes 
to a growing number of survey papers which attempt to refocus 
attention on the assumptions underlying regression methodology as 
it is practiced today (e.g.. Draper and Van Nostrand 1979; Smith 
and Campbell 1980; Hocking and Pendleton 1983). These articles 
are especially noteworthy because they force investigators to 
confront fundamental questions relating to one of Che nost diffi- 
cult and controversial problems facing data analysts: redundant 
predictor variables in a regression analysis. 

Professor Belsley criticizes the prevailing practice of 
centering predictor variables (usually followed by scaling to 
unit length) prior to assessing the presence and effects of 
collinearity . He clarifies the position of Belsley, Kuh and 
Welsch (1989) that predictor variables should be scaled to unit 
length but not centered prior to diagnosing collinearity. He 
argues unequivocally that collinearity diagnostics are only 


meaningful when interpreted in terms of "basic variables" which 
are "structurally interpretable In keeping with the preference 
endorsed in his book, he stresses the use of the condition index 
as the only appropriate measure of collincarity . 

Without hesitation I laud Professor Belsley’s effort to re- 
dress the lack of attention to the role of centering in discus- 
sions of collinearity and his effort to create a framework within 
which collinearity can be more rigorously examined. If I differ 
with him on any of the issues which he raises, my divergence o f 
opinion rests primarily with the dogmatic insistence that there 
is one correct technique within which discussions of collinea- .y 
must be straight jacketed. Rather, I believe that many of the 
technical issues he raises art related more to one's perspective, 
education, and experience than necessarily to £ correct technique 
for the proper assessment of collinearity. 

1. CONFLICTING PERSPECTIVES 

Although Professor Belsley repeatedly cautions against cen- 
tering when diagnosing collinearity, he is careful to point out 
that there are legitimate circumstances under which centering is 
appropriate. As has been argued elsewhere (Hocking 1983, with 
discussion), it is common practice tv, center all experimental 
designs and attendant analyses when fitting response surfaces. 
Similarly, Marquardt (1980) argues that polynomial regression 
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coefficients derive their interpretability only when predictor 
variables are centered. Bradley and Srivantava (1979) stress 
that centered, symmetrically-located and equally-spaced values 
of the predictor variables should be selected in any polynomial 
regression analysis in which the investigator can control the 
values of the variates. Thus there are wide classes of regres- 
sion problems for which centering is considered essential, even 
% 

if the data are collinear. 

The major difference between the above illustrations and 
the arguments posed by Professor Belsley is one of perspective. 

The above illustrations are most relevant in industrial settings 
where controlled experimentation is prevalent and constant terns 
are Unown to br necessary for adequate model fits. Observation 
rather than experimentation is more common in the economic studies 
to which Professor Belsley alludes in his example of consumption 
functions. In observational studies it is not necessarily assuned 
that constant terms are inherent to correct model specification 
(e.g., consumption can only be zero when the constant term is zero). 
Each of these perspectives should be recognized as Legitimate when 
appropriate. 

Centering can be either beneficial or detrimental regardless 
of whether one's perspective is derived from industrial experi- 
mentation or observational studies. Centering replaces each pre- 
dictor variable wich the residuals from a least squares regression 
of that variable on the constant terra. In an intuitive sense. 


centered predictor variables contain no common information with 
the constant term of the model. In addition, centering alters 
the constant term: 


becomes 


where 


a + 6.x. + 6_x 0 + ... + 6 x + e 

11 2 2 p p 


v = + BjWj^ + 8,w, + ... + +t 


( 1 ) 


2 2 


P P 


6. = a + B,x, + 6 n x_ + ... + 6 x and w. = x - x.. 

0 1122 pp JJJ 

Inferences on Bq are all but meaningless since the x^ are data- 
dependent; an exception sometimes arising when the predictor- 
variable values are predetermined in a designed experiment. In 
general, then, if one wishes to make inferences on the level of 
the response variable (including tests for no-intercept models), 
centering is pointless. On the other hand, if one wishes to 
draw inferences on whether the predictor variables contribute 
to the fit of the response variable _in addition to the constant 
term (i.e., the response variability is not simply due to random 
fluctuation about its level) then centered predictor variables 
are essential. 

In this latter setting one need not sacrifice diagnostic 
information about possible collinearitv of the predictor vari- 
ables with the constant term. The estimated standard error of 
the constant term of model (1) can be expressed as 

s .e. (B q ) =* o{n - 1 , X(X'X)“ 1 X , 1:“ 1/2 

1 / 2 W1 , 2 , - 1/2 
= (o/n ) (1— R q ) 


2 

where Rg is the coefficient of determination when the constant 
term is regressed on the other predictor variables. Note that 
if in the definition of the model the columns of X are centered, 
as can occur in an experimental design, s.e.(pg) = o/n ; other- 

A A A 

wise, s.e. (i3g)>o/n . Consequently, 

s.e.(i 0 ) 2 /(o 2 /n) = 1/C1-R 2 )' 1 

is the variance inflation factor for the constant term of the 
model. For the example in Section 1, 

s.e.(6 0 ) 2 /(o 2 /n) = (.784) 2 /( [ .0055] 2 /20) > 400,000. 

There is clearly a collinearity problem among the predictor vari- 
ables and the constant term. Centering does not demean the colli- 
nearity diagnostics, one must simply understand the nature of cen- 
tering and know where to look for the appropriate diagnostic . 


2. MEASURING COLLINEARITY 

Comparisons of collinearity measures provide valuable guide- 
lines for data analysts. Farrar and Glauber (1967), Learner (1973), 
Mason, Gunst, and Webster (1975), Willan and Watts (1978), and 
Belsley, Kuh, and Welsch (1980) describe a wide variety of important 
collinearity diagnostics; however, rarely will any single collinearit 
measure completely characterize the nature and effects of collinear 
predictor variables. Some measures are appropriate for assessing 
the sensitivity of least squares estimates to minor perturbations 
of the input data (condition indices), others more readily measure 
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the effects of collinearity on the variances of the estimators 
(variance inflation factors), still others aid in identifying the 
nature of the collinearities (predictor-variable correlations, 
eigenvalues and eigenvectors of suitably scaled, perhaps centered, 
X'X matrices). Table 1 lists selected collinearity measures from 
the above references according to possible usage. 

[Insert Table 1] 

Just as there is no monopoly by any single collinearity 
measure on usefulness, collinearity itself is difficult to define. 
Professor Belsley stresses conditioning as a descriptor of colli- 
nearity. This author's preference is to define collinearity by 
analogy with the algebraic definition of linear dependence among 
a (normalized) set of vectors (Gunst 1983) : 

Defn. A collinearity is said to exist among the columns 
of X = [_x^, X 2 » . ... _Xp] if for a suitably small 
predetermined n > 0 there exists constants c^, c 0 , ..., 

c , not all of which are zero, such that 
P 

c 1 x^ + c^x^ + . . . + c^x^ = with | |_6 | | < n *| |_c | | . 

Neither this definition nor any other which can be offered is 
entirely satisfactory (e.g., How small should r be? How large 
should a condition index be?) but each is a meaningful concept 
to many data analysts , depending again on background and expe- 
rience. Hocking and Pendleton's (1983) "picket fence" analogy 
is a marvelously simple geometric explanation of collinearity 
which can be more useful than either of the above technical 


definitions when one must characterize predictor-variable 
redundancies to those who have limited statistical training. 

Even from a strictly analytic point of view there are 
difficulties with all measures of collinearity, difficulties 
which limit the global utility of each diagnostic. Belsley, 
Kuh, and Welsch contend that small eigenvalues of X'X are 
inadequate measures of collinearity since perfectly-conditioned 
matrices of the form 



can have arbitrarily small eigenvalues for sm^ll values of u > 0 
even though the two columns of X are orthogonal (no collinearity). 
Condition indices suffer from much the same problem if one examines 
perfectly-conditioned matrices of the form 



and 8 > 0 is allowed to become arbitrarily small while i > 0 is 
held fixed. Note that if one scales the columns of X to unit 
length both of the above matrices are identity matrices for all 
a, 6 > 0 and both the eigenvalues of X'X and the condition index 
of X will correctly diagnose the perfect conditioning. 

3. STRUCTURAL INTERPRETABILITY 

A major contribution of this article, one with whith I am in 
wholehearted agreement, is its focus on correct model formulation. 
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Model formulation receives scant attention in regression textbooks, 
most of the emphasis being directed to variable transformations. 
Structural interpretability demands that careful consideration be 
directed toward the initial specification of each variable in a 
regression model. Professor Belsley specifically directs his 
comments to centering but the overall admonition which he conveys 
is more general. 

The question of variable definition is a difficult one. It 
is too frequent that one encounters "proxy" or "surrogate" variables 
used in place of the real quantities one seeks but is unable to 
measure. The rationalization that this is the best one can do leads 
to acceptance of arbitrary variable definitions in many other cir- 
cumstances. For example, few worry about whether temperature is 
expressed in degrees Celsius or Fahrenheight. Yet this is per- 
haps the traditional example one uses to distinguish interval- from 
ratio-scaled variables in introductory statistics courses, a uni- 
variate version of structural interpretability. 

Structural interpretability as a universally-accepted principle 
in model formulation must await refinement. The relevance of struc- 
tural interpretability, at least with regard to centering, is not 
clear when quantities such as "beta-weights" are the intended goal 
of a regression analysis. Calculation of these quantities requires 
that predictor variables be standardized, a transformation which 
destroys structural interpretability. Extension to polynomial 
models, models which include interaction terms, and nonlinear models — 
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( 


( 


admittedly topics which are beyond the scope of the present manu- 
script — are additional important refinements which await clari- 
fication. nevertheless, the warning is clear: challenges to inter- 
pretability are inevitable when model formulation, including cen- 
tering, is slighted. 

Finally, structural interpretability is a concept which is 
meaningful without any reference to collinearity . The application 
of structural ^nterpretability to regression implies that a constant 
term is included in the model if appropriate, not of necessity. 
Collinearity is a separate issue which becomes relevant only after 
proper model formulation. In order to properly diagnose collinearity 
variates should be centered or standardized as required to apply 
appropriate analytical techniques. 

4. THE EXAMPLE 

Smith and Campbell (1980) discuss clearly the relationship 
between (linear) predictor-variable transformations and the dis- 
guising of collinearities as small predictor-variable variances. 
Mullett (1976) demonstrates that the ill-effects usually associated 
with collinearities can be produced by other causes, including small 
predictor-variable variances. These discussions have particular 
relevance to collinearities involving the constant term. 

Earlier the collinearity anong the three predictor variables 
in the example was shown to be diagnosable from the standard error 
of the estimated coefficient of the constant term. That the "non- 
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constant" predictor variables are essentially constant is apparent 
from their coefficients of variation (s^/x^): each is approximately 
.0023. Regardless of whether each is correlated with the other, 
a coefficient of variation this small calls for immediate investi- 
gation of collinearity if one's intent is to evaluate each of the 
predictor variables for its predictive ability without regard to 
the presence or absence of the others, including a constant term. 

A collinearity with the constant term occurs either because 
two or more of the nonconstant predictors are (reasonably) vari- 
able and some linear combination of them is essentially constant 
or because individual variates are essentially constant. The 
former situation can be detected from an analysis of the certered 
(standardized) variates, the latter from the coefficients of vari- 
ation (often just from the standard deviations). In either 
case, if collinearity with the constant term is a concern, exam- 
ination of the standard error of the constant term will readily 
reveal the existence of a problem. 

4 . FINAL REMARKS 

This manuscript is an excellent example of the dialogue which 
should periodically review the foundations upon which regression 
methodology is based. While differences of opinion will inevitably 
arise, separation of the fundamental issues from personal preference 
is important. I am in fundamental agreement with Professor Belsley 


11 


on what I perceive to be the key issues in his article: (i) model 
formulation, using concepts like structural interpretability , is 
essential for meaningful inferences from a regression analysis, 
(ii) careful consideration should be given to whether collinearity 
with the constant term is important to detect, and (iii) if so, 
collinearity diagnostics which enable such detection must be 
examined. While we may prefer alternative collinearity diag- 
nostics, "we seek the same goal with equally-ef fective diagnostic 
techniques . 
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TABLE 1 


Selected Collinearity Measures 


Detection Estimator 

Measures Effects Precision 


Predictor-Variable 

Correlations 

Condition Indices 

Variance Inflation 
Factors 

Variance^ Inflation 
Factors 

Estimator Correla- 
tions 

A A 

s.*.(g .), s.e.(y) 

Eigenvalues, Eigen- 
vectors of X*X 

Curve Decolletages 

Variance Decompo- 
sition Proportions 

Condition Indices 


Volumes of Confi- 
dence Ellipsoids 


REGRESSION DIAGNOSTICS AND APPROXIMATE INFERENCE PROCEDURES 
FOR PENALIZED LEAST SQUARES ESTIMATORS 


R. L. Eubank and R. F. Gunst* 


ABSTRACT 

Generalizations of least squares diagnostic techniques are 
presented for a class of penalized least squares estimators. 

Efficient computation of these diagnostics is afforded by expressions 
which relate coefficient estimates and residuals from fits to sub- 
sets of the data to the corresponding quantities from a fit to the 
complete data set. From these expressions approximate confidence 
intervals and test statistics can be obtained using jackknife and 
bootstrap procedures. Applications are discussed for the special 
cases of smoothing splines and ridge regression. 


KEY WORDS 

Bootstrap confidence intervals; Jackknife confidence intervals; 
Leverage values; Ridge regression; Smoothing splines; Studentized 


residuals 


AUTHOR’S FOOTNOTE 


* R. L. Eubank is Assistant Professor and R. F. Gunst Is Associate 
Professor and Chairman, Department of Statistics, Southern Methodist 
University. This research was supported in part by Office of Naval 
Research Contract No. N00014-82-K-0207 and by the National Aero- 
nautics and Space Administration under Contract No. NCC 9-9. The 
authors would like to thank Mr. Tom Carmody for furnishing Figure 
1 and Dr. Paul Speckman for helpful discussions on the material in 


this manuscript. 


REGRESSION DIAGNOSTICS AND APPROXIMATE INFERENCE PROCEDURES 


FOR PENALIZED LEAST SQUARES ESTIMATORS 
R. L. Eubank and R. F. GunaC 

1. INTRODUCTION 

Regression diagnostics are an integral component of compre- 
hensive regression modeling efforts, in large part because of 
recent theoretical advances which lead to computational efficiency. 
With few exceptions (a notable one being Pregibon (1981)) these 
advances have been restricted to ordinary least squares (OLS) esti- 
mation for linear models. In this paper diagnostic techniques are 
extended to a class of penalized least squares estimators which 
include smoothing splines and ridge regression estimators as 
special cases. An additional benefit of these results is the 
ability to efficiently compute jackknife confidence intervals and 
other inferential statistics for model parameters. 

Let y - (y 1 ,...,y n )' be a vector of observed responses which 
folli-w the model 

y ■ n + e , (l .l) 

where n * (fi,,...,n ) ' is a vector of unknown constants and 
x n 

e ■ (e^,...,e n )' is a vector of zero mean, unccrrelated errors with 
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common variance o . It is assumed that n is to be approximated 
by a linear form XB where X is a known nxp matrix of rank p n 
having ith row x’ and B * (B^,...,B )* is a vector of parameters 
wnich is to be estimated. The class of estimators which are 
investigated in this article are those obtained as the solution to 

min{l” (y -x’S) 2 + AB'Qe}, X >_ 0 , (1.2) 

B J =i J 1 

with Q denoting an arbitrary positive (semi-) definite matrix. 

For a given Q, X, and X, expression (1.2) has a unique solution: 

B - C(X)y , (1.3) 

where 

C(X) = (X'X + XQ) -1 X' . (1.4) 

The estimator B is termed a penalized least squares estimator of 
B. Observe that when X = 0, B reduces to the OLS estimator 

B = (X , X)" 1 X'y . 

At the other extreme, if Q is positive definite B -*■ _G as X -*• «. 

In many instances it is preferable to use a value of X between 
tnese two extremes and a variety of methods are available for esti- 
mating its value from data. For example, Golub, Heath and Wahba (1979) 
discuss generalized cross-validation (GCV) as well as other data- 
driven methods for selecting X. 

It is often reasonable to make the stronger assumption that 
n = XB under which model (1.1) becomes the linear regression model 

y = XB + e . (1.6) 

When this model holds and no further assumptions are made, B will 
be termed a generalized ridge regression estimator of B; however. 


the results presented below are of sufficient generality to Include 


H(X) = {h ij (X)> = XC(X) . 


cases in which the represent values from an unknown regression 
function, n, which is to be estimated nonparametrically . When 
appropriately formulated (see Section 5) the smoothing spline esti- 
mator of n is seen to be a special case of estimator (1.3). 

As with ordinary least squares, the penalized least squares 
"hat matrix" (see Hoaglin and Welsch 1978) provides important 
diagnostic information about the influence of individual observa- 
tions (y^.xp on the associated prediction equation. The hat 
matrix corresponding to 8 is defined to be 

(1.7) 

This matrix transforms the response vector y to the vector of 
fitted values, y * (y^, . . . ,y n ) ' ; i.e., 

y = H(X)y . 

The element h^(X) is a direct measure of the influence of y^ on. the 
fit to y^. In particular, the "leverage value" h^(X) measures the 
influence of y^ on its own prediction. 

This study of tha estimator class (1.3) begins with a deriva- 
tion of some of the properties of H(X) in Section 2. In Section 3 
techniques are presented for computing estimates and fitted values 
when observations are deleted from the data set. The results of 
this section are applied, in Section 4, to obtain approximate 
inference procedures for the parameter vector 6 and to derive diag- 
nostic measures for detecting influential observations. Specific 
applications to nonparametric estimation by smoothing splines 
ani to ridge regression estimators are detailed in Section 5. Con- 


eluding remarks are made In Section 6 


2. LEVERAGE VALUES FOR PENALIZED LEAST SQUARES 

In this section certain properties of the hat matrix H(A) will 
be derived. It will be seen that the characteristics of its elements 
are closely related to those of the hat matrix H for the corresponding 
OLS estimator: 

H = {h i j } = Xtt'X)”^' . (2.1) 

Since H in equation (2.1) is a (orthogonal) projection operator, 
the following properties are easily proven: 

i) 0 < h u < 1 

ii) -1 < h t . < 1, i * j (2.2) 

iii) h ij = 1 => h ij * °» 1 * j * 

When X contains a constant column, somewhat sharper results are 
provided by 

i) * n _1 <_ <_ 1 

ii)' -(n-l)n -1 < h tj < i, i + j (2.3) 

iii)' h i± = 1 <“> h_ * 0 , i i j . 

Extreme rows of X result in large leverage values. The rough 
cutoff of h^ > 2p/n suggested by Hoaglin and Welsch (1978) is 
often used to identify such rows. Note from iii) and iii)' that, 

A A 

as h^ -+ 1, h^ **• 0, i # j and y^^ * x^B ■+■ y^ indicating that an 
observation with a large leverage value will tend to dominate its 
own fit. 

For A > 0, H(A) is no longer a projection matrix. The following 
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theorem establishes bounds for the elements of H(A) as a function 
of the corresponding elements of H, thereby providing an analog 
of properties i) and ii) in equation (2.2). 

Theorem 2.1. The elements of H(A) satisfy 

|h (A)| 1 (1 + Xd i)” 1{h ii h jj }1/2 (2 * 4 > 

where d^ is the smallest eigenvalue of (X'X) ^Q. 


Proof . Using the spectral decomposition (eg. Kshirsagar 1972, 

1/2 

Chapter 7) of X write X - UL ' Z', where L * diag(Jl 1 , . . . ,£ p ) 
is a diagonal matrix containing the nonzero eigenvalues of XX' (and 
X'X) in ascending order, and U * [u^,...,Up] and Z are the corre- 
sponding matrices of eigenvectors of XX' and X'X, respectively. 

H(A) can now be expressed as 

H( A) = U(I + AL -1 ^ 2 Z , QZL _1 ^ 2 ) -1 U' . (2.6) 

Let 0 d^ <_ d£ <_ ... <_ dp denote the eigenvalues of L ^^Z'QZL 
(which are also the eigenvalues of (X'X) ^Q). Using T = [Y^,...»Yp] 
to denote the corresponding matrix of eigenvectors, individual elements 
of H(A) can now be represented as 

V X) ' J r-l b lr b jr <1+Xd r)' 1 - b kr * “kV < 2 ' 7 > 

Application of the Cauchy-Schwarz inequality in equation (2.7) along 
with the ordering of the d f completes the proof. I — I 

Theorem 2.1 and its proof have several important consequences. 
First, it furnishes tighter bounds for the elements of H(A) than 
the inequalities in equation (2.2); i.e.. 


i) 0 <_ h u (A) 1 (1 + Ad x ) A 

ii) -(1 + Ad 1 >“ 1 < h tJ (A) 1 (1 + Xd 1 )" 1 , i # j . 


( 2 . 8 ) 


In addition, from equation (2.7), it is apparent that h^CA) is 

monotonically decreasing with A from (0) ■ h^ to h^(°»). Note 

that in general h^ ») > 0 unless d^ > 0; when d^ > 0, ■ 0. 

Since h^ (A) is continuous in A, standard results from calculus can 

be used to show that for A sufficiently small (large) h^ (A) will 

have the same sign as h^ (h^ (<*>)) provided that h^ + 0 (h_ (“O^O) . 

Two important special cases occur when (i) 0 = d^ * ... - d^ 

< d .. < ... < d and (ii) Q * I. These special cases have appli- 
m+i — — p 

cations to smoothing splines and ridge regression, respectively, 
which will be explored in Section 5. The important details are 
summarized in the following two corollaries. 

Corollary 1. Suppose 0 = d, = ... « d <d,,<... <d and 
i m nrri — — p 

define h. . (°°) ■ E m .b, b. , where the b. are as in equation (2.7). 
ij r=l ir jr* kr ^ 


Then 


-1 ? l 2 


h ll<"> + < 1 +id p> ‘ J K t I h ii (X) ‘ + t fir- 

^ r**ra+l r*nrt-l 

(2.9) 

Corollar / 2 . If 8 =■ (X'X + AD^X'y then 

I h ij (A) | < il p (l p +A)“ 1 {h ii h jj } 1/2 

where £ is the largest eigenvalue of X'X. The upper bound for 

-1 1/2 

the ith leverage value, viz. £ ( £. +A) , is obtained when x'=£ z' 

6 P P ’ j P P 

where z is the eigenvector corresponding to £ . 

P P 


3. DELETING OBSERVATIONS FROM AN ESTIMATOR 


The development of exact tests and interval estimates for 8 
using the penalized least squares estimator B is a difficult, and 
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as yet unresolved, problem. In contrast, approximate techniques 
based on nonparametric procedures such as the jackknife and boot- 
strap are easy to propose but their practicality depends on the 
ability to efficiently perform the necessary calculations. In this 
section a simple method of deleting observations from S is derived 
which requires no refitting of the data. This is found, in Section 
4, to make the use of inference techniques such as jackknife confi- 
dence regions for 6 a practical alternative and to allow a general- 
ization of several types of regression diagnostic measures to the 
penalized least squares setting. 

For q <_ n-p let J = {j^,...,j } be a subset of the indices 
{l,...,n} and let 8^ represent the coefficient estimates obtained 


using only those (y^ , xp with j t J. The following theorem provides 


a partial characterization of 6 


S(J) 


Theorem 3.1. Let 6^ (w.. ,...,w. ) solve 

J 1 J q 

min{ £ (y.-x!B) 2 + £ (w -x!6) 2 + Xe*Q6 } 
6 jtfJ J J jeJ J 3 


and define y^ 


-CJ) 

x|8 , i * l,...,n. Then, 

- 6 < J > , 


(3.1) 


(3.., 


Theorem 3.1 has the consequence that 8^ can be obtained by 


applying C ( X) to a "new data vector" wherein y^ has been replaced 
by for all jeJ» This would seem to presuppose knowledge of 

8^; however, such is not the case and in many cases of interest 
it is possible to compute the yj^ without explicit computation of 


i(J) 


This property follows by application of the next theorem. 


ORIGINAL PAC r _* 

OF POOR QUALITY 
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Theorem 3.2 . The values yj^ , j£j, satisfy the linear equation 


system 


y i J) " E jeJ h ij (X)y j " y i “ E jeJ h ij (X)y j 


- Z j ^jh ij (A)y j , i eJ . (3.3) 

“ (J) 

Proof of Theorems 3. 1-3. 2 . Set w j = • Proof of Theorem 3.1 

is provided by the following inequalities: 

* v ( v x f B<J))2+x ' e<J>,<l ' B<J> 

» E j^j (y j _x j^ <J))2 + X ® (J) '^ (J) 1 E j^j (y j _x j B)2 + X0,Qe 

< I J|(J (y r xjB> 2 + + AB'QB . 

To verify equation (3.3) note that x'f3^(w , ...,w ) is linear 

h \ 

in Wj , jeJ, and can, therefore, be written as 

xj3^(Wj ,...,Wj ) * x^B + E j e j h ij (X)( w j~yj) • (3.4) 

Letting = xjB^ 8 ives the desired result. ^ 


To illustrate uses for Theorem 3. 1-3. 2 confine attention, for 
the moment, to the instance q ■ 1, J ■ {j} for some j e {l,...,n}. 
To distinguish this important special case the notation 

B [jl - B (J) (3.5) 


and 

“m = x ,;m ( 3 . 6) 

is utilized. Application of Theorem 3.2 to this special case yields 

'[ 1 ] 

the following expression for y^ : 

y j^ = (y j- h jj (X)y j )/(1 " h jj (X)) - 


(3.7) 


This relationship explicitly demonstrates the ability to obtain 
"Ml 

each of the yj J without refitting the model. 

The term "deleted residual" will be used to designate the 
-[1 ] 

difference y^ - yj . Equation (3.7) provides an efficient compu- 
tational form for the deleted residual; viz.. 


6 [J] 2 y j ” y j^ “ (3.8) 


jJ 


where e^ is the jth residual from the fit to the entire data set: 


6 j * y j ” y J* J 

Substituting equation (3.8) into equation (3.2) yields 






(3.9) 


(3.10) 


where c^(X) is the jth column of C(X). 

Formulas (3.8) and (3.10) include as special cases the equi- 
valent expressions for ordinary least squares, X ■ 0 (e.g., Beckman 
and Trussel 1974; Hoaglin and Welsch 1978). In the case of smoothing 
splines equation (3.8) was established by Craven and Wahba (1979) 
using a method of proof similar to the one employed here. 


4. INFERENCE AND DIAGNOSTICS 


Equation (3.8) provides a fundamental expression for the 
derivation of approximate confidence intervals to complement the 
point estimator 8. Define the jth vector of pseudo-values by 


bj - nB - (n-l)e lj] 

- £ + (n-l)c (X)e^ j . 


(4.1) 


Then the jackknife estimator of 6 based on S is b * n 


ORIGINAL PAG': IS 
OF POOR QUALI TY 
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and the variance-covariance matrix of 8 or b can be estimated by 
V - Ej -1 (b j - bHbj - b)Vn(n-l) . (4.2) 

For a linear functional a' 6, an approximate 100(l-a)% confidence 
interval is provided by 

a’B + Z /0 (a'Va) 1/2 or a'b + Z /0 (a'Va) 1/2 <^-3) 

— a/2 — a/2 

where Z ,,, is the 100(l-a/2) percentage point of the standard normal 
a / L 

distribution (critical values from a Student’s t distribution with 
n-1 degrees of freedom could be used in place of expression 

(4.3)). Notice that the interval estimates (4.3) can be computed 
using information available entirely from the original fit. When 
A * 0, equations (4.1)-(4.2) reduce to formulae given in Miller (1974), 


Hinkley (1977a), and Efron (1982, Chapter 3) for jackknifing B. 

Diagnostic measures which parallel those utilized for ordinary 

least squares can also be derived as a result of (3.8) and (3.10). To 

2 

do so first note that a natural estimator of o associated with the 
penalized least squares estimator B is 

(4.4) 


o 2 = z" ml e 2 /tr(I-H(A)) 


where tr denotes the matrix trace. This estimator reduces to the 

2 A *2 n 2 

usual estimator of o associated with 3, namely o * I^_^e^/ (n-p) , 

when A = 0. The estimator (4.4) has been found to be quite effective 

for spline smoothing by Wahba (1983). Studentized (deleted) 

residuals can then be defined as 

1/2 (4.5) 


C [j] " e j /0 [j] (1 ‘ h jj (X)) 


“2 


where °^j j *- s t ^ e estimator (4.4) computed from the reduced data set 
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m 2 

wherein (y ^ ) has been excluded. An explicit formula for ° ^ j j is 


-2 n 2 Ml 

°[J] " Z , (e i + h U (X)e [J] ) /tr(I “ H J (X)) 


i-1 


‘ij w '[Jl' 


(4.6) 


with 


tr(I-H [jl (X)) - n-1 - Z [h^U, + h lj (X) 2 /(l-h jj (X))]. (4.7) 


i-1 

i*) 


;[j] 


To prove formulas (4.6)— (4.7) observe that y^ J J can be written as 

I . a v . The coefficients a. can be deduced from equation (3.2) 
rfj irr ir 

and used to establish equation (4.7). Tne form of the numerator 
follows easily from expression (3.10). 

The studentized residuals along with formulas (4.6)-(4.7) are 
generalizations of relations which hold when X - 0 (e.g. t Gunst and 
Mason 1980, Chapter 7). These residuals provide a scaled measure 
of how the fit to y changes when its value is not used to estimate 
6. They can, therefore, be used to detect overly influential data 
values. The value of t^j might be compared to values from a Student's 
t distribution with approximately tr(I-H^(X)) degrees of freedom. 
Simulation results discussed in Section 5 indicate that Student's t 
critical values provide a reasonably good approximation for 57. cutoff 
values for the t^j. Through similar considerations a variety of 
other diagnostic measures can also be suggested. One such example 
is 

DFFITS 




[hj j (X) / (l“hj j(X))] 1/2 t[ jj . J " 1 * • • • »n, 


(see Velleman and Welsch 1981 or Belsley, Kuh and Welsch 1980). 
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Deleting q > 2 observations Is somewhat more complicated than 

the case q - 1. When q > 2 it is no longer obvious that equations 

“(J) 

(3.3) always uniquely determine the y . This will be true if 

and only if (I-H(A))j, the submatrix of I-H(A) corresponding to those 

indices in J, has rank q. For example, when q ■ 2, J - { 1 , j } this 

condition is equivalent to (1-h^ (A) ) (1-hj ^ (A) ) - ( A) ^ i 0. 

Instances where this is not satisfied would seem rare in practice. 

Now suppose that one obtains m random samples of q indices 

each, J. , by sampling with replacement from {l,...,n}. A 
1 m 

bootstrap estimator of the variance-covariance matrix of 6 is pro- 
vided by 

.(J) - -(J) - 

W ■ E m . (8 -e*)(B -6*)7(m-l) (4.8) 

r-1 

-1 m - ( V 

where 8* * m £ .8 • If the matrices (I-H(A)) all have rank q, 

r 

W can be computed using equations (3.2)-(3.3) and its elements can 
then be used to obtain bootstrap analogs of the jackknife confidence 
intervals (4.3). A similar approach when all possible subsets of 
size q are used leads to the development of grouped jackknife inter- 
val estimates of 8 (see Efron 1982, Chapter 2). 

To conclude note that when A * 0 Theorems 3.1 - 3.2 can be 
used to establish "leave-q-out" identities such as equation (7) 
of Draper and John (1981). It is, therefore, possible to generalize 
leave-q-out diagnostics such as those discussed in Gentleman and 
Wilk (1975a, b) and Draper and John (1978, 1981) to the case of 
penalized least squares estimation. 
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5 . EXAMPLES 


In this section the application of results in Sections 3 and 
4 to the special cases of smoothing splines and ridge regression 
will be illustrated. 

5.1 Smoothing Splines 

Suppose n is a smooth response function and that ■ n(tj), 
0 < t, < ... < t < 1, in model (1.1). For n > ra the smoothing 
spline estimator of n, denoted by n, is obtained by minimizing 

/.-\ ft -I > 


“ f(t J )) + ^ f (t) dt 


Cl' 


over all functions f having m-1 absolutely continuous derivatives 
and a square integrable mth derivative. Schoenberg (1964) proposed 
this type of nonparametric estimator for n and showed that n was a 
spline function of order 2m with knots at the t^ . General dis- 
cussions of smoothing splines can be found in Wahba (1977), Wegman 
and Wright (1983) and Eubank (1983). 

Deomler and Reinsch (1975) (see also Speckman 1983) develop 

a basis for spline smoothing which consists of functions x^,...,x n 

and constants 0 q, - ... ■ q < q. . , < ... < q which satiafy 
i m mri n 


and 


£ r-l x l (t r )lt J (t r ) ‘ S ij 


/ 1 x< m) (0x< n,) (t)dt - , 


(5.2) 


(5.3) 


where 6 is the Kronecker delta. They show chat the minimizer 
of criterion (5.1) is necessarily of the form 


f(t) - Ij^B^t) 


(5.4) 
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hence, it sufficies to minimize criterion (5.1) over functions of 
this type. Substituting f(t) from (5.4) into (5.1) and invoking 
the relationships in equation (5.3) gives the equivalent criterion 

MnU j-l (y r I r-l S r x r ( V )2 + 1 ' (5-5) 

Comparison with (1.2) reveals this to be a special case of penalized 
least squares estimation with n - n, xj ■ (x^ (t^ ) , . . . ,x n (tj ) ) and 
Q ■ diag (q^, ... ,q^) . Therefore, 

6 - D(X)X’y (5.6) 

where D(X) ■ diag((l + Xq^) \...,(1 + Xq^) ^). 

The hat matrix corresponding to the estimator (5.6) is 

* . . -1 
H ( X ) - XD(X)X' ; moreover, since X'X ■ I the eigenvalues of (X X) Q 

are simply the q^ . Applying Corollary 1 of Section 2 

the following bounds are obtained for h^(X): 

h u (-> + (l + i h ii< x > i h u<" ) 

+ (1 + x Vl >' 1 Wl x r (t l )2 • (5 - 7) 

where h^(”) » (t^)^. follows from Demmler and Reinsch (1975) 

that h^ i (®) Is the ith leverage value for regression on polynomials of 
order m. Equation (5.7) therefore establishes a connection between 
the leverage values for spline smoothing and those for polynomial 
regression. These results generalize to multivariate "Thin Plate" 
or Laplacian smoothing splines (e.g., Wahba 1981; Wahba and Wendle- 
berger 1980; and Wendelberger 1981) where the h^(X) may be parti- 
larly useful in the detection of sensitive points in the design. 

To illustrate the behaviour of some of the diagnostic and 
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inferential methods proposed in Section 4, a small scale sinula- 
tion was conducted. Data sets were generated from model (1.1) with 
n i " " ^•26{exp(-3.25t i )-4exp(-6.5t i )+3exp(-9.75t i ) 

t i ■ (i-l)/n, n ■ 80 , 

and normal errors with o values of .05, .1, .2 and .4. This 
function Is a rescaled version of one studied by Wahba and Wold (1975). 
The basic experiment was replicated r - 50 times (l.e., 50 data sets 
of size 80) with each replicate being "treated" by all four values 
of o. A cubic smoothing spline (m ■ 2) was fitted to each data set 
with X selected via GCV. 

Approximite 95% jackknife confidence intervals for the n^, 

centered at n., vere computed by taking a! ■ (x. ( t .),... ,x (t.)) 

i l i l n l 

in equation (4.3). The proportion of tines the true function value 
was contained in its interval estimate was recorded along with the 
value of and the proportion of times exceeded the 5% (two- 

tailed) critical value for the Student t d. stribution. Summary 
statistics for the simulation are given in Table 1. A typical 
example of these results, for o ■ .1, appears in Figure 1. 

[Insert Table 1, Figure 1] 

The empirical confidence levels in Table 1 are somewhat lower 
than might be desired. However, by using 99% rather than 95% inter- 
vals, confidence levels in excess of 94% were obtained in all cases. 
This is typical of simulations performed with other function types 
and other configurations for the values of r, n, and o. These 
results will appear elsewhere. As illustrated in Table 1, the 
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Student's t approximation to t^j and the estimator o performed 
well. 

5.2 Ridge Regression 

Ridge regression estimators (Hoerl and Kennard 1970; Marquardt 
1970) are solutions to the criterion function (1.2) when (i) only 
the nonconstant predictor variables from model (1.1) are included 
in X, (ii) the predictor variables are standardized so that X'X is 
in correlation form, and (iii) Q=I. Much controversy persists over 
automated selection of X, the effect of standardization on ridge 
estimation, and the assumptions underlying the validity of the 
ridge estimator (e.g.. Draper and Van Nostrand 1979; Smith and 
Campbell 1980, with discussion), in order to demonstrate the 
application of the results of Section 2-4, assume that for a 
specific regression analysis the criticisms noted above are 
satisfactorily answered and that a ridge regression analysis is 
deemed appropriate. 

Ridge regression diagnostics can be obtained from the results 
of Sections 2-4 under the conditions stated above; however, the 
efficient computational expressions for deleted estimators (i.e., 

8^ and B^) and deleted residuals (i.e., e^j) are exact only 
if the reduced X matrix is not restandardized when rows are deleted. 
Hinkley (1977a) noted a similar restriction when he cautioned against 
obtaining (least squares) jackknife estimates of the constant term 
of a regression model using centered predictor variables. Since 
the major benefits of centering and standardization cited by 
Marquardt (198°) are essentially maintained when one (or a small 
number) of the rows of the standardized X matrix is (are) deleted, 
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only the original matrix of predictor variables is standardized 
in the following example. 

Gunst and Mason (1980, Appendix A) contains a data set on the 
gross national product (GOT) of 49 countries of the world along with 
the six additional socioeconomic indices: an infant death rate (INFD), 
a physician/population 'atio (PHYS), population density (DENS), pop- 
ulation density measured in terms of agricultural land area (AGDS) , 
a literacy measure (LIT) , and an index of higher education (HIED) . 
Table 2 displays regression diagnostics for the fit of Zn(GNP) by 
the six socioeconomic indices. 

[Insert Table 2] 

The relatively small value of A (0.08) which was chosen for this 

illustration has little effect on the be v id for ridge leverage values 

given by Corollary 2 since £^/(i^+0.08)*0.97. With the exception of 

Malta, least squares leverage values which exceed 2(p+l)/n = 0.286 

are also large with the ridge estimator using the analogous bound 

2(tr [H(A) ]+l) /n = 0.. , .71. Although the ridge DFFITS values appaar to 

be slightly more uniform than those of least squares (e.g., none of 

the former are greater than 1.0 in magnitude), four of the five 

1/2 

observations which exceed 2{(p+l)/n} =0.756 for least squares also 

, 1/2 

exceed 2{ (tr [H( A) 1+1) /n} =0.736 for ridge regression — llalta is again 

the exception — and a similar comment can be made about the t r .,. 

Malta is obviously affecting the two estimation procedures 
differently. It has high leverage and is influential on the least 
squares fit but has neither high leverage nor an influent-lal impact 
on the ridge regression fit. A scatterplot of DENS and AGDS reveaic 


that Malta lies well off the concentrated linear scatter (r = 0.97) 
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between these two variates. i.»us by lessening the effect of the 
strong pairwise correlation between DENS and AGDS on the estima- 
tion of the regression coefficients, the ridge estimator is also 
lessening the influence of Malta on the fit. Although the other 
least squares and ridge diagnostics identify equally important 
characteristics of this data set, comparison of the two sets of 
diagnostics has provided important insight about Malta which might 
have gone unappreciated had only the least squares diagnostics 
been examined. 

Table 3 displays least squares, ridge (\ = .08), and jack- 
knifed ridge (b) coefficient estimates and confidence intervals. 

The purpose of presenting the ridge and jackknifed ridge estimates 
is to highlight typical characteristics of these estimators, not 
to draw definitive conclusions relative to this data set. Note 
in particular that, while similar, the ridge and jackknifed ridge 
estimates are somewhat different. In addition, both of these 
lattev two estimators produce jackknife confidence intervals- 
(using expressions (.4.3)) which are shorter than least squares. 

In view of the simulation results in Section 5.1, it might be 
advisable to adjust these confidence intervals (not done here) 
by using a larger Student t critical point. If one uses 99% 
nominal coverage, the ridge confidence interval for the coeffi- 
cient of DENS includes the origin. 

Obviously a more complete analysis of this data set is needed 
in order to resolve questions which remain about influential observa- 
tions and the s_gnificance of the predictor variables. Any thorough 
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analysis must Incorporate prior knowledge about the regression 
coefficients and information concerning the intended use of the 
conclusions which are to be drawn from the fitted model. These 
topics are beyond the scope of this paper; nevertheless, this 
example illustrates some important characteristics of penalized 
least squares diagnostics and approximate inference procedures. 

6 . CONCLUDING REMARKS 

The results of this paper generalize least squares regression 
diagnostics and certain approximate inference procedures to a 
class of ''quadratic) penalized least squares estimators for linear 
models. Theorems 3.1 and 3.2 produce expressions for deleted esti- 
mators and residuals which provide exact, computationally efficient, 
calculation of quantities such as pseudo values and Studentized 
residuals. These results have wide application, two specific 
illustrations being nonparametric estimation with smoothing splines 
and ridge regression. 

Much research remains to be conducted regarding the properties 
and usage of the procedures proposed in this paper. For example, 
the jackknife confidence intervals do not achieve the nominal con- 
fidence level, although they are well-known to be insensitive to a 
variety of nimodal error distributions. Corrections for the jack- 
knife such as those proposed in Hinkley (1977b, 1978) may alleviate 
coverage difficulties and the behavior of jackknife intervals under 
nonnormal errors merits further investigation. Likewise, the sensi- 
tivity of jackknife confidence intervals to the choice of X warrants 


further study. For instance, in the ridge regression example 

increasing A from 0.08 to 0.20 decreases the estimated standard 

errors of the individual coefficients between 5 percent (HIED) 

and 50 percent (AGDS) . On the other hand, the Studentized 

2 

residuals and the estimator of o performed well in the simula- 
tion in Section 5.1. Similarly, the ridge regression diagnostics 
highlighted an important characteristic of the presence of Malta 
which could have been overlooked if only the least squares dia- 


gnostics were examined 
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TABLE 1. Summary Statistics for the Simulation 


Empirical Confidence Empirical Significance Estimated 
Levels Levels Variance 


0 

Average 

Std. Error 

Average 

Std. Error 

Avg. 

MSE 

.05 

.8838 

.0084 

.0508 

.0025 

.0023 

2xl0" 7 

.10 

.8868 

.0087 

.0510 

.0024 

.0091 

3xl0* 6 

.20 

.8863 

.0102 

.0493 

.0021 

.0366 

5xl0" 5 

.40 

.8843 

.0149 

.0490 

.0023 

.1490 

-4 

6x10 


s 


I 


TABLE 2. 

Regression Diagnostics for GNP 

Data, Selec 

:ed Observations 

Obsn. 


Least Squares 

Ridge (X-.08) 


h Jj 

c m 

DFFITS 1 

V- 08 > 

c m 

DFFITS 

BARBADOS 

.238 

-2.026 

-1.131 

.137 

-1.929 

-.769 

CANADA 

.042 

2.011 

.419 

.039 

2.111 

.423 

HONG KONG 

.511 

-.107 

-.109 

.471 

-.138 

-.130 

INDIA 

.558 

1.337 

1.502 

.507 

.903 

.917 

JAPAN 

.049 

-2.799 

-.633 

.046 

-2.743 

-.602 

LUXEMBOURG 

.084 

2.356 

.713 

.077 

2.391 

.690 

MALTA 

.688 

1.506 

2.236 

.262 

.426 

.254 

SINGAPORE 

.632 

.562 

.736 

.516 

.632 

.653 

TAIWAN 

.178 

-2.401 

-1.119 

.129 

-2.475 

-.953 

U.S. 

.490 

.804 

.787 

.447 

.951 

.855 


TABLE 3. Coefficient Estimates and Nominal 95% (Individual) 
Confidence Intervals 


Predictor Least Squares Ridge Regression Jackknifed 

Variable Estimates (1 ■ .08) RldR e 

(a) Coefficient Estimates 


INFD 

-1.87C 

-1.772 

-1.695 

PHYS 

.171 

- .125 

.113 

DENS 

-1.094 

- .410 

- .606 

AGDS 

.862 

.151 

.453 

LIT 

2.298 

1.985 

2.163 

HIED 

1.454 

1.411 

1.662 



(b) Confidence 

Intervals 




INFD 

(-3.012,- .729) 

(-2. 218,' 

-1.326) 

(-2.142,- 

•1.250) 

PHYS 

(-1.192, 1.535) 

(- .524, 

.274) 

(- .286, 

.512) 

DENS 

(-4.718, 2.530) 

(- .767, 

- .053) 

(- .963,- 

• .249) 

AGDS 

(-2.738, 4.462) 

(- .188, 

.490) 

( .114, 

.792) 

LIT 

( .748 3.848) 

( 1.408, 

2.562) 

( 1.586, 

2.740) 

HIED 

( .528, 2.380) 

( .994, 

1.828) 

( 1.245, 

2.079) 


■ OS 



